model definition
veScale: Consistent and Efficient Tensor Programming with Eager-Mode SPMD
Li, Youjie, Wan, Cheng, Lin, Zhiqi, Zhu, Hongyu, Yang, Jiacheng, Song, Ziang, Di, Xinyi, Wu, Jiawei, Shu, Huiyao, Bao, Wenlei, Peng, Yanghua, Lin, Haibin, Chang, Li-Wen
Large Language Models (LLMs) have scaled rapidly in size and complexity, requiring increasingly intricate parallelism for distributed training, such as 3D parallelism. This sophistication motivates a shift toward simpler, more debuggable programming paradigm like Single Program Multiple Data (SPMD). However, SPMD in eager execution introduces two key challenges: ensuring consistency with single-device execution and achieving high performance at scale. In this paper, we introduce veScale, an eager-mode training system that fully embraces SPMD paradigm to democratize distributed tensor programming. veScale addresses the prevalent issue of inconsistent results in systems like PyTorch by introducing a novel algorithm of distributed Random Number Generation (RNG) compatible with arbitrary sharded operators. veScale also significantly boosts training performance by reducing PyTorch primitive's overhead and improving communication efficiency. Evaluations show that veScale delivers up to 2.2x speedup over the state-of-the-art training systems, like TorchTitan, and cuts code complexity by 78.4%, while preserving single-device-equivalent results.
"Definition Modeling: To model definitions." Generating Definitions With Little to No Semantics
Segonne, Vincent, Mickus, Timothee
Definition Modeling, the task of generating definitions, was first proposed as a means to evaluate the semantic quality of word embeddings-a coherent lexical semantic representations of a word in context should contain all the information necessary to generate its definition. The relative novelty of this task entails that we do not know which factors are actually relied upon by a Definition Modeling system. In this paper, we present evidence that the task may not involve as much semantics as one might expect: we show how an earlier model from the literature is both rather insensitive to semantic aspects such as explicit polysemy, as well as reliant on formal similarities between headwords and words occurring in its glosses, casting doubt on the validity of the task as a means to evaluate embeddings.
Variational Auto Encoders (VAE) for the Numerai Dataset
The Numerai dataset contains decades of historical data on the global stock market. Machine learning models trained on the dataset learn to predict stock returns and earn cryptocurrency (NMR) based on performance in the Numerai Tournament. This blog post first explains "why" variational autoencoder is a suitable tool in a Numerai model developer stack. Then, we discuss "what" a variational autoencoder is and show "how" you can train one. We can use VAEs for anomaly detection, denoising, and generating synthetic data.
A tutorial on building end-to-end Deep Learning models in PyTorch
PyTorch is a very powerful framework for building deep learning. This framework is not as complex to learn as compared to other deep learning frameworks because of its straightforward way of model building. In this article, we will discuss how to build an end-to-end deep learning model that can be helpful for a novice machine learning practitioner. Through this tutorial, we will demonstrate how to define and use a convolutional neural network (CNN) in a very easy way by explaining each of the steps in detail. The major points to be covered in this article are listed below.
Shame on you, if you don't know these Machine Learning tools!
Today we will look into one of the most important aspects of being a productive Machine Learning Specialist or Data Scientist. Namely, the best tools, frameworks, and packages to stay relevant as a Machine Learning practitioner. Especially in the ever-evolving world of Machine Learning, things move so fast that learning about new tools on a constant basis is crucial. Additionally, we also go over sub tools and little helpers that will simplify your life, and here and there, also some lines of code to assure that these tools also convince you aesthetically. As always make sure to stay in the community flow and comment your favorite tools down in the comment section so we all can learn what you are excited about.
Ludwig: a type-based declarative deep learning toolbox
Molino, Piero, Dudin, Yaroslav, Miryala, Sai Sumanth
In this work we present Ludwig, a flexible, extensible and easy to use toolbox which allows users to train deep learning models and use them for obtaining predictions without writing code. Ludwig implements a novel approach to deep learning model building based on two main abstractions: data types and declarative configuration files. The data type abstraction allows for easier code and sub-model reuse, and the standardized interfaces imposed by this abstraction allow for encapsulation and make the code easy to extend. Declarative model definition configuration files enable inexperienced users to obtain effective models and increase the productivity of expert users. Alongside these two innovations, Ludwig introduces a general modularized deep learning architecture called Encoder-Combiner-Decoder that can be instantiated to perform a vast amount of machine learning tasks. These innovations make it possible for engineers, scientists from other fields and, in general, a much broader audience to adopt deep learning models for their tasks, concretely helping in its democratization.
r/MachineLearning - [P] Neural Network Model Builder & Visualiser Netbrix.ml
JavaScript library and wanted a project to build up my skills with it! I ended up going with a simple web app for visualising and editing network models which I've named netbrix.ml. I've wanted to build something like this for a while since it seemed like a really good project to improve my web development skills and my understanding of the process of building deep learning models. After reading this post on /r/deeplearning where the writer gives insight into the modular nature of deep learning and gives the analogy of a deep learning'lego set' it gave me the motivation to start work on this with that sort of vision in mind and I've now got a decent working web app! I'm sure there are existing tools similar to this in existence, so I wanted to keep it as simple as possible and not try to over-engineer it. It's meant to be easy and simple to use!
cchio/deep-pwning
Researchers have found that it is surprisingly trivial to trick a machine learning model (classifier, clusterer, regressor etc.) into making an objectively wrong decisions. This field of research is called Adversarial Machine Learning. It is not hyperbole to claim that any motivated attacker can bypass any machine learning system, given enough information and time. However, this issue is often overlooked when architects and engineers design and build machine learning systems. The consequences are worrying when these systems are put into use in critical scenarios, such as in the medical, transportation, financial, or security-related fields. Hence, when one is evaluating the efficacy of applications using machine learning, their malleability in an adversarial setting should be measured alongside the system's precision and recall. This tool was released at DEF CON 24 in Las Vegas, August 2016, during a talk titled Machine Duping 101: Pwning Deep Learning Systems.